NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BubbleID: A deep learning framework for bubble interface dynamics analysis

https://doi.org/10.1063/5.0207546

Dunlap, C; Li, C; Pandey, H; Le, N; Hu, H (July 2024, Journal of Applied Physics)

This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetime and capturing dynamic events such as bubble departure. BubbleID is trained and tested on boiling images across diverse heater surfaces and operational settings. This paper also offers a comparative analysis of bubble interface dynamics prior to and post-critical heat flux conditions.
more » « less
Full Text Available
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

https://doi.org/10.1609/aaai.v37i3.25412

Yamazaki, K; Vo, K; Truong, Q S; Raj, B; Le, N (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Video Paragraph Captioning aims to generate a multi-sentence description of an untrimmed video with multiple temporal event locations in a coherent storytelling. Following the human perception process, where the scene is effectively understood by decomposing it into visual (e.g. human, animal) and non- visual components (e.g. action, relations) under the mutual influence of vision and language, we first propose a visual- linguistic (VL) feature. In the proposed VL feature, the scene is modeled by three modalities including (i) a global visual environment; (ii) local visual main agents; (iii) linguistic scene elements. We then introduce an autoregressive Transformer- in-Transformer (TinT) to simultaneously capture the semantic coherence of intra- and inter-event contents within a video. Finally, we present a new VL contrastive loss function to guarantee the learnt embedding features are consistent with the captions semantics. Comprehensive experiments and extensive ablation studies on the ActivityNet Captions and YouCookII datasets show that the proposed Visual-Linguistic Transformer- in-Transform (VLTinT) outperforms previous state-of-the-art methods in terms of accuracy and diversity. The source code is made publicly available at: https://github.com/UARK-AICV/ VLTinT.
more » « less
Full Text Available
Survey of Adirondack Metamorphic Temperatures Using Quantitative EDS Mapping

Boucher, E.; Horne, M.; Le�n, A.; Lissit, J.; Minkowitz, C.; Page, F. Z.; Roche, B. (December 2021, AGU Fall Meeting 2021)

Full Text Available
Public Concern About Monitoring Twitter Users and Their Conversations to Recruit for Clinical Trials: Survey Study. Journal of Medical Internet Research

Reuter, K.; Zhu, Y.; Angyan, P.; Le, N.; Merchant, A.; Zimmer, M. (October 2019, Journal of medical internet research)

Full Text Available

Search for: All records